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Crystal structure of human 
gonadotropin 

A. J. Lapthorn, D. C. Harris, A. Littlejohn, J. W. Lustbader', 
R. E. Canfleld , K. J. Machln*, F. J. Morgan 1 & N. W. Isaacs 

Department of Chemistry, University of Glasgow, Glasgow G12 8QQ, UK 

* Department of Medicine, Columbia University, New York 10032, USA 

t St Vincent's Institute of Medical Research, Melbourne, Victoria 3065, Australia 

? Department of Biochemistry, LaTrobe University, Melbourne, Victoria 3083, Australia 

The three-dimensional structure of human chorionic gonadotropin shows that each of its two 
different subunits has a similar topology, with three dlsulphide bonds forming a cystine knot. 
This same folding motif is found in some protein growth factors. The heterodlmer Is stabilized 
by a segment of the p-subunit which wraps around the a-subunit and Is covalently linked like 
a seat belt by the dlsulphide Cys 26-Cys 110. This extraordinary feature appears to be essential 
not only for the association of these heterodlmers but also for receptor binding bv the glyco- 
protein hormones. 67 



In humans the early stages of pregnancy are maintained by the 
hormone chorionic gonadotropin (hCG) from the conceptus. 
This hormone is structurally related to the anterior pituitary 
gonadotropins, follicle-stimulating hormone (FSH) and luteiniz- 
ing hormone (LH), which regulate the cellular and endocrine 
function of the ovary and testis. Together with thyroid-stimulat- 
ing hormone (TSH) they constitute the family of glycoprotein 
hormones. These consist of two subunits, a and 0, which associ- 
ate non-covalently to form a heterodimer; in a given species the 
cr-subunits are identical and the 0-subunits are different (but 
homologous) for the different hormones. Although the hetero- 
dimer is required for receptor binding, it is the 0-subunit that 
determines the specific activity of each hormone. The hormones 
are all glycosylated with AMinked complex carbohydrates, which 
confer heterogeneity on a given hormone; hCG is the most heav- 
ily glycosylated with four additional O-linked carbohydrates on 
the serine-rich C-terminal extension of the 0-subunit. In human 
glycoprotein hormones, the common a-subunit contains 92 
amino acids, with 10 half-cystine residues which form five intra- 
molecular disulphide linkages. The 0-subunits vary in size from 
114 residues in LH to 145 in CG and contain 12 half-cystines 
which form six conserved disulphide bridges. There is a high 
degree of sequence similarity in the first 1 14 amino acids between 
hCG and the other hormones (LH 85%; FSH 36%; TSH 46%). 
The homology between hCG and LH reflects a common biologi- 
cal function, as both proteins bind the same receptor; FSH and 
TSH bind to structurally similar but distinct receptors. 

Here we present the crystal structure of hCG, report the cor- 
rect disulphide pairings in both subunits, and describe the overall 
protein fold and the dimer association that involves a structur- 
ally unique ( seat-belt' arrangement. The receptor-binding site is 
described. We show that both subunits of glycoprotein hor- 
mones are members of a structural superfamily of cystine-knot 
growth factors. 

Structure determination and refinement 

It was previously thought that heterogeneity of carbohydrate 
prevented any of the glycoprotein hormones from crystallizing. 
The bulk of the carbohydrate can be removed from hCG by 
treatment with anhydrous hydrofluoric acid, leaving the 
deglycosyiated protein with the ability still to bind receptors, 
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although post-translational effects are not triggered 1 . Circular 
dichroism shows that the structures of the deglycosyiated and 
native hormones are identical 2 . Using the crystallization condi- 
tions for HF-treated hCG (HF-hCG) 3 , isomorphous crystals of 
neuraminidase-treated hCG have been grown in which only the 
terminal sialic acid residues have been removed 4 . 

HF-treated hCG crystallizes from ammonium sulphate 
solutions as hexagonal bipyramids of space group />6 5 22 and 
cell dimensions a =88.68, c= 177.24 A. The crystals diffract to 
3.5 A resolution on a laboratory X-ray source and to 2.5 A with 
synchrotron radiation, but are very sensitive to radiation dam- 
age. The structure has been determined to 3.0 A resolution by 
multiple isomorphous replacement (MIR) and maximum 
entropy (ME) solvent flattening; Table 1 gives the details and 
statistics for the phasing procedure (details of the ME phasing 
procedure will be published elsewhere). 

The final electron density map allows a tracing of the a-sub- 
unit from residues 5 to 89 and the 0-subunit from 2 to 1 1 1 . The 
residue 01 1 1 is open to a large 50 A diameter solvent channel 
and it is assumed that the remaining 34 C-terminal residues 
adopt a random conformation and are therefore not visible in 
the map. 

The initial model was refined using the program XPLOR 5 . 
Five rounds of refinement and model building gave a final struc- 
ture with a crystallographic R factor of 21.8% for data from 
10-3 A. No solvent molecules, but six A^-acetylglucosamine 
molecules positioned in good density have been included. A geo- 
metric analysis of the refined structure with the program 
PROCHECK 6 gave values better than expected for structures 
determined at this resolution. 

The disulphide bridges 

There are five disulphide bonds in the a-subunit and six in the 
0-subunit. The chemical assignment of the disulphide pairings 
has been extensively studied (reviewed in ref. 2). The assignments 
of Mise and Bahl 7,8 are considered the most reliable, with a7- 
31, alO-32, 093-100, 026-1 10 and perhaps 023-72 assumed to 
be correct. From the ME phased map, some electron density 
could be interpreted as disulphide linkages 23-72, 26-110 and 
34-88 of the 0-subunit. The remaining S-S bridges could not be 
formed as expected (that is, between residues 38-57, 9-90) and 
the map could be interpreted only if the linkages were from 
residues 9 to 57 and 38 to 90, thereby forming an unusual knot 
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of three disulphide bridges. The density for the a-subunit con- 
tained a similar cystine knot, and for the amino-acid sequence 
to fit, disulphide linkages 7-31, 10-60, 28-82, 32-84 and 59-87 
were necessary. 

This cystine-knot motif 9 is found in three growth factors: 
NGF 10 , TGF-02 1 ' and PDGF-BB 12 . The motif involves three 
disulphide bridges arranged so that two disulphides link adjacent 
antiparallel strands of the peptide chain and form a ring pene- 
trated by the third disulphide (Fig. 1). The disulphide pairings 
are: a(7-31, 10-60, 28-82, 32-84, 59-87) and 0(9-57, 23-72, 
26-110, 34-88, 38-90, 93-100) (Figs 1 and 2). Of the linkages 
not previously predicted (a 10-60, 28-82, 32-84; 09-57, 38-90), 
all are involved in the cystine knots. 

Structure of the protein subunits 

Figure 2a shows that the subunits have similar folds determined 
largely by the central cystine knot. On one side of the knot 
there is a loop of double-stranded 0-sheet-like structure (the 
long loop); on the other side there are two hairpin loops lying 



in almost parallel planes. In the j3-subunit, these hairpins are 
linked by the Cys 23-72 disulphide bond. 

In the C-terminal hairpin loops of each subunit, there are a 
pair of 0-sheet bulges at adjacent positions in each strand (a62 7 
65, 79-82; 059-62, 85-88). In the 0-subunit, the bulges have a 
classical hydrogen-bonding pattern; in the a-subunit the pattern 
is less complete. The structures of all the cystine-knot growth 
factors have 0-sheet bulges located immediately after Cys(V), 
suggesting they are an integral component of this motif. 

The long loop in the a-subunit consists of a stretch of antipar- 
allel 0-sheet made up of residues 35-39 and 52-56. The three 
residues Pro38-Thr39-Pro40, which are conserved in all 18 a- 
subunit sequences determined, break the 0-sheet away at an 
angle of about 100° and lead into two turns of a 3, 0 -helix involv- 
ing residues 40-46. Asn 52 is also on a bend that directs the N- 
linked carbohydrate on this residue out from the interface 
between the a- and 0-chains. 

The corresponding region of the 0-subunit forms a very open 
loop with no main-chain hydrogen bonds between the antiparal- 



TABLE 1 Summary of crystailographic analysis 



Diffraction data 
Data set Native 

Detector Xen./Film 
Resolution (A) 3.0 

Unique reflections 8,365 
Completeness (%) 94.9 

Rmerge (%) ^ 

MIR phasing 
Mean fractional ID 
Number of sites 
Rc 

Phasing power centric • 
acentric 

Phase combined FOM 0.52 



K 2 Pt(CN) 4 


AgN0 3 


AgN0 3 


Hg(00CH 3 ) 2 


KAu(CN) 4 


K 2 Pt(CN) 4 


Xen. 


Xen. 


Xen. 


Film 


Film 


Film 


3.5 


5.0 


3.5 


3.5 


3.9 


3.3 


3,169 


1,096 


3.664 


2,473 


2,272 


2.949 


58.9 


81.6 


70.6 


44.0 


55.2 


52.4 


6.0 


7.7 


7.5 


10.0 


9.1 


7.5 


0.126 


0.094 


0.115 


0.080 


0.148 


0.073 


2 


2 


3 


1 


2 


1 


0.73 


0.68 


0.82 


0.80 


0.80 


0.78 


1.1 


1.0 


0.7 


0.7 


0.7 


0.6 


1.4 


1.5 


1.1 


0.9 


1.0 


0.6 



Crystallization and data collection: Crystals were grown at 25 °C as described 3 . More than 20 heavy-atom compounds were tested with a 
variety of soak conditions to yield four useful derivatives. Diffraction data were collected on film on station 7.2 (A = 1.48 A) at the SERC Synchroyon 
Radiation Source. Daresbury. Film data were processed using the MOSFLM program and native and derivative data were scaled using SCALEIT 
from the CCP4 suite of programs 40 . Area detector data were collected on a Siemens/Xentronics P1000 area detector, and processed using the 
XDS program 41 . The final native dataset used for refinement, on which the current model is based, is a combination of a complete Xentrontcs 
dataset to 3.5 A resolution collected from a single crystal, merged with the less complete film data collected from several crystals wh.cn extends 
to 2.8 A. From a consideration of the merging R factor statistics, only data to 3.0 A were used. XEN, ^ n ^^J^ a ^^ 
R m . ,. = 11/ - <OI/U, where /, is the intensity of an individual reflection and </> is the mean intensity of that reflection; ID. tsomorphous differences 
Mill anaU The Pt derivative was solved using SHELXS86 42 and the other derivatives by a combination of difference Four.er syntheses and 
difference Patterson maps. There are essentially three sites; the major site shared by the Pt and Au derivatives is in a pocket on a 2-fold axis of 
symmetry binding 045 Arg and its 2-fold symmetry equivalent. Both the primary Ag site and the only Hg site ar % ve y'°^^ 
a29 Met and 041 Met) and the other Ag sites are at a83 His and a71 Met The heavy-atom parameters were refi ned and MIR phases plated 
using MLPHARE for the six derivative datasets. using both isomorphous and anomolous differences to give a FOM of 0.52. The correct space i group 
was determined from refinement of heavy-atom sites fn MLPHARE using anomalous data from the Pt derivative and confirmed b y/^ r h f^ ( ^ 
in the twist of 0-sheets in the final structure. Cullis R-factor, Rc-Z\\F m ±FA -F„|/I|Fp„-Fp| where F P and are the structure factor ampl tudes 
for native and derivative data respectively and F H is the calculated heavy-atom structure factor phasing power = <F H >/<£>. where <F H > is the 
r.m.s. of the heavy-atom scattering factor and <£> is the r.m.s. lack of closure; the summation is computed for reflections used I uv ^ e hea^m 
refinement cycle FOM, figure of merit. Donalty modWcatkMi and modal building: The initial phases were refined a t 3.5 A resolut.on using the 
Wang solvent-flattening procedure 43 . A protocol of five cycles of solvent flattening following an envelope redetermination (55-60% solvent) was 
used for 10 rounds of phase refinement Phase extension from 4.0 A using very small increments in resolution was used in an attempt to improve 
the quality of the maps. Although -75% of the protein chain, including the major ^-strands in the structure, could be traced, the maps were not 
sufficiently clear to define the subunit interface or to get a sequence alignment. Further density modification was achieved us.ng the program 
MICE 44 Using MIR-phased reflections with FOM 2*0.70 as basis-set reflections and all data to 3.0 A resolution, maximum entropy solvent flatten.ng 
(MESF) was performed. Seven rounds of MESF were calculated with phase recombination computed using SIGMAA and envelope determination 
using the Wang procedure. A round of phase permutation using 7 important unphased reflections gave phases for 5 of these, which were then 
combined with the initial MIR basis-set reflections and an additional seven rounds of MESF computed. Partial structure phasing from 14 .peptide 
fragments, with side chains assigned according to size was included in phase recombination in the later rounds of MESF. This led to a map which 
wai sufficiently unambiguous to allow the complete tracing of both subunits. Refinement: The polypeptide chain was built into a combined map 
using the programs FRODO 46 and 0 47 . This model was refined at 3.5 A using the program XPLOR 5 . Five rounds of model rebuilding and refinemen 
using both conventional positional refinement and simulated annealing, during which the resolution was extended to 3.0 A, resulted in the current 
model. The model comprises a total of 1,565 non-hydrogen atoms. Temperature factors were refined individually and then averaged by group with 
a final overall 6 value of 26.5 A 2 . Data from 10 to 3.0 A with F>2a(F) (7,877 reflections) were used in the final refinement to give ™*tef> r ™ 
21.8%, and a Free R factor (791 reflections) of 31.4%. The stereochemistry is typified by an r.m.s. bond deviation from ideality rt Q ;™7*™ 
r.m.s. angles of 2.28°. Alt but one of the non-glycine y and * angles lie in allowed regions in the Ramachandran plot, with 77 ^ in tne most 
favoured regions. Atomic coordinates will be deposited in the Brookhaven Protein Data Bank. 
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FIG. 1 The structure of the a- and 0-subunit cystine 
knots shows the common fold that comprises three 
disulphide bridges, arranged so that two disulphides 
link adjacent parallel strands of the peptide chain and 
form a ring through which the third disulphide pene- 
trates, a, The common arrangement of .the half cys- 
tines in the sequence 9 is shown for the 0-subunit. If 
the half cystines within the knot are assigned roman 
numerals to indicate their order in the protein 
sequence, disulphide bonds Cys(ll>-Cys(V) and Cys(!ll>- 
Cys(VI) form the ring through which the interpenetrat- 
ing disulphide Cys(l>-Cys(IV) informed. There are two 
general sequence requirements, Cys( 1 1 )-X-G ly-X-Cys{ III) 
and -Cys(V}-X-Cys(VI), although this is not absolute as 
in NGF, where the central "Gly in the first motif is 
replaced by- a 7-amino-acid loop (Fig. 5). These 
sequence patterns are found in each subunit of hCG: 
a28-32 (CMGCC), a82-84 (CHC), 034-38 (CAGYC) 
and 088-90 (CQC). In the isolated subunits, the disul- 
phides in the cystine knots show varying degrees of 
solvent accessibility, as predicted 48 . CysflH'V) and 
' Cys(ll)-Cys(V) are buried in the knot, but Cys(lll)-Cys(VI) 
is exposed to the solvent like the remaining disulphide 
bridges, b, In the a-subunit there are additional disul- 
phide bridges (7-31 and 59-87) in close proximity to 
the knot These invariant residues seem to play a role 
in formation of the heterodimer. There is a notable 
similarity in overall structure in the knots and the only 
significant differences, such as the dihedral angle of 
Cys(lll>-Cys{VI), are due to conformational restraints 
imposed to accommodate the additional cystines in 
a. c, The (2|F ot »| - jF^d) electron density map around 
the cystine knot of the a -subunit is indicative of the 
quality of the final map. The refined structure of the 
protein is illustrated as an atomic stick figure, colour 
coded for atom type: C, white; 0, red; N, blue; S, yellow. 
The electron density is contoured at a la level. 
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lei strands. Instead, the loop is stabilized by side chain-to-main 
chain hydrogen-bond interactions from Arg 43 to Pro 50 and 
Leu 52 and from Gin 54 to Met 41. One side of the loop, residues 
38 to 45, is held by association with the a-subunit. In the crystal, 
a two-fold axis of symmetry causes the hydrophobic residues 



FIG. 2 The structure of each subunit 
is shown as a Ca trace with disulphide 
bridges. (The a-subunit is blue and the 
0-subunit is green in this and all sub- 
sequent figures.) Each molecule has a 
similar fold consisting of two 0-hairpin 
loops on one side of a central cystine 
knot and a long loop on the other, a, 
The hairpin structure of each subunit 
is stabilized by a hydrophobic core 
extending between the two loops. The 
a-subunit core consists of al7 Phe, 
a!8Phe, a25 He, a68Val, a70Val, 
a 71 Met, a 74 Phe and a 76 Val, with 
the three phenylalanines clustered 
together and partially exposed to the 
surface. The 0-subunit core includes 
the residues 016 Leu, 018 Val, 
027 lie, 029 Val, 067 lie. 069 Leu, 
080 Val and 082 Tyr, which partly 
buries the disulphide bridge 023-72. 
The two hairpins of a are essentially regular, apart from an unusual 
bend at the invariant Asn 15 where the side chain hydrogen-bonds to 
the main chain NH of residues 17 and 18, resulting In Pro 16, Phe 17 
and Phe 18 bulging away from the antiparallel strand and forming a 
prominent surface. Residues 90-111 are highlighted in red on the 0- 
subunit This 'seat-belt* region is a novel structure essential for forming 



48-50 to pack against the same residues from a neighbouring 
molecule. 

Apart from the cystine-knot motif, there is no sequence simi- 
larity between the subunits, but when the cystine knots are super- 
imposed, the similarity in structure is remarkable (Fig. 2b). The 




the heterodimer. b, Superimposing the a- and 0 -subunits at the cystine 
knot reveals the extent of their common structure. The Ca position of 
22 residues extending from the knot can be superimposed with a r.m.s. 
difference of 0.52 A. Subunits are coloured as in a, with a-subunit disul- 
phides in orange. 
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FIG. 3 a, A stereo view of the complex of a- and 
0-subunits (blue and green, respectively) showing 
the disulphide bridges (yellow) and observed car- 
bohydrate (using colour coding on atom type), b, A 
schematic representation of the central 0-barrel 
One face consists of a short strand from the B- 
subunit (97-102) forming a parallel /5-sheet with 
o(52-57), which is part of a four stranded anti- 
parallel /3-sheet as shown. The other face of the 
barrel has two antiparallel strands (84-90. 63-56) 
of the /3-subunit. 



two molecules have a common core structure and differ mainly 
in the extent of the loop regions. In the 0-subunit the pair of 
hairpin loops are extended in size, while the long loop is larger 
in the a-subunit. 

Formation of the heterodlmer 

Figure 3a shows the heterodimer. The two subunits are related 
by an approximate two-fold axis. Segments of well denned 0- 
shee structure near the cystine knot in each subunit are brought 

BUKtrln 3 Sh ° rt Seven - Strand{d 0- barrel < Fi 8- 3*)- The 
th» J 2 GY . ^wence). long implicated in the formation of 

£LnT ' ,S thC ° entral Strand - Awa y from ' hi s antral 
region, there are a number of hydrogen bonds (data not shown) 
rnere IS a short stretch of antiparallel 0-sheet (044-46 and a 77- 
75) which helps to hold the long loop of the 0- subunit to the 
hairpm loops of the a-subunit. In addition, there are hydropho- 
bic contacts between valine and leucine at 044-45 and the triplet 
of phenylalamne residues (al7, 18, 74) in the hairpin loop of 
ZL oT^'lI^ com P ,ete association buries a total surface 
area of 4,525 i A between the subunits: 2,134 A 2 by a 2,241 A 2 

X SS«£ St ^ CarbOMrate 00 ,hC a - SUbUni «' ^ ifi - 

.h* A |« Un ? PCC, 2 1 fe , atUre ° f the as *>«ation is the role played by 
the loop from Cys 90 to Cys 110 in the 0-subunit (Fig. 2a). In 
the heterodimer this loop is wrapped over the a-subunit while 
S"? C0V3 o n l y b ° nded ,0 ,he ^" subuni t '"rough the disul- 
Sff ' " ka «f ^- 90 ' 26 - 1 10 >' and ha * appearance of a seat 
fe» If L iL J e u S C ' 0Se assotia ti°n between the inner sur- 
fl ?h~. ai £2f a " subuni « w ith a short strand of parallel 

L? . fl b t ,Wee , n ^ ( "- 101) and a ( 53 " 57 > inning part of the 
fsu? a £ ^ SidC ChainS 0f four °- subunit «*idues form 
'£n lnt £ aCtS With either main chain or side chain 
2S«f £ A 1 '" ^ are hydr °8 en bonds ^ween the side 
chains of a35 Arg and 0106 His and the side chain of a56 Glu 
and the main cha.n of 0104 Lys. Non-bonded interactions are 
458 



BEST AVAILABLE COPY 



formed between ^37 Tyr, 0107 Pro and 0108 Leu and between 
a 54 Thr and 099 Asp. There are no contacts specific to the hCG 
molecule that would prevent this seat-belt arrangement existing 
in all of the glycoprotein hormones. 

Formation of Cys 93-100 is necessary for association with the 
a-subunit and a form of hCG0 consisting of residues 8-100 
associates only weakly with a 14 . The disulphide Cys 26-1 10 is 
completed after the a/0 association has occurred' 5 , indicating 
that the seat-belt region is important in maintaining the integrity 
ol the heterodimer. 

Carbohydrate structure 

In its native form, hCG is heavily glycosylated with complex 
carbohydrate making up -34% by weight of the protein. yV- 
linked carbohydrates occur at Asn 52 and Asn 78 on the a- 
subumt and at Asn 13 and Asn 30 on the 0-subunit. In addition 
he 0-subunit has four O-linked carbohydrates at Ser 121 127* 
132 and 138. Treatment of the protein with anhydrous HF for 
one hour leaves the O-linked carbohydrate largely intact and 
truncates the asparagine-linked carbohydrate to predominantly 
Asn-(GIcNAc) 2 (refs 4, 16). 

In the crystal structure, electron density is clear for two sugar 
residues on each of the a-subunit /V-linked carbohydrates (Fig 
4a) and only one sugar residue is seen for those of the 0-subunit 
1 hese fragments are sufficient to allow modelling of the complete 
carbohydrate (Fig. 4b). In the a-subunit, the carbohydrates are 
at the extremities of the molecule: Asn 52 is on the double- 
stranded loop and Asn 78 is near the end of the 0-hairpins. 
The carbohydrate structure makes few hydrogen bonds to the 
protein In the 0-subunit, both /V-linked carbohydrates are 
exposed on the outward faces of adjacent 0-strands with Asn 13 
and Asn 30 less than 7 A apart (Fig. 3a). Some of the glycopro- 
tein hormones have only one of these sites glycosylated. 
The only carbohydrate that interacts at the subunit interface 
is on a52Asn which contacts the 0-subunit residues 
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FIG. 4 a, Carbohydrate density at a 78 Asn 
contoured at a 1.0a level in the final 
2|F6 — Fc\ difference Fourier, showing clear 
density for two N-acetyl-glucosamine units, 
o, A model of the complete N-tinked carbo- 
hydrate on the hCG molecule illustrates the 
extent of the carbohydrate structure. The 
protein is represented as surfaces 49 (a, 
blue; £, green), with the carbohydrate rep- 
resented as a space-filling model with atom 
colouring: N, blue; 0, red; C, white. The car- 
bohydrate was modelled by optimizing the 
fit of a complex btantennary carbohydrate 
from thrombin (coordinates provided by P. 
Martin) on the observed sugar residues at 
each of the glycosylation sites. The carbo- 
hydrate at a 78 is far removed from the 
other three sites. The positions of the £13 
and £30 carbohydrates are very close, 
forming a large bulk of carbohydrate. Some 
distal residues of the £ carbohydrate could 
be close to one of the antennae of the 
aS2 carbohydrate which is located at the 
dimer interface and is within the proposed 
receptor-binding domain. This sugar is best 
placed for biological function, c, A stereo 
view of the proposed receptor-binding site. 
The essential determinant loop (£93-100, 
residues in red) is centrally positioned with 
the £-subunit long loop (38-57, coloured 
by atom type) and Tyr88 (white) of the 
a-subunit C-terminus above. Below is the 
carbohydrate (coloured by atom type) at 
a52 Asn, with the adjacent a40-50 loop. 
Some invariant residues are coloured 
magenta, d, Side chains in the determinant 
loop £93-100, with colour coding for atom 
type. The adjacent helical region of the a- 
subunit shows the position of Leu 48, 
Val 49 and Lys 51. e, The unusual wedge- 
shaped £38-57 long loop extends from 
the hairpin loops of the a-subunit. Hydro- 
phobic residues are predominantly surface- 
oriented; hydrophilic residues are either in 
the centre of the loop or buried in the a/p 
interface. 



NATURE • VOL 369 ■ 9 JUNE 1994 BEST AVAILABLE COPY 



ARTICLES 



Tyr 59, Val 62, Phe 64 and Ala 83 as well as Thr 97, from the 
determinant loop. 

Receptor binding and biological activity 

A number of residues important for the activity of hCG have 
been identified through 'chemical modification, the use of syn- 
thetic peptides in competitive inhibition studies, and site-directed 
mutagenesis (reviewed in refs 2, 17, 18). As a result, several 
regions that contribute to receptor binding have been identified 
(Fig. 4). 

The (093-100) determinant loop sequence (Fig. 4d). The 
importance of this sequence was recognized by Ward and 
Moore 19 , who proposed that the variable charge in this loop 
could act as a determinant of specificity, positive for LH/hCG 
and negative for FSH and TSH. The disulphide 93-100 which 
forms this loop is important for activity 20 and site-directed muta- 
genesis of either Cys93 or Cys 100 in hCGfi 2x yields mutant 
forms incapable of associating with the a-subunit. 

The determinant loop is surface-oriented and held in place by 
a short stretch of parallel 0-sheet between c53-57 and 099-101. 
The disulphide 093-100 sits between the a 53-56 and 050-57 
strands, making non-bonded contacts with a55 Ser and 056 Val 
and constrains the loop to form a single turn of 3, 0 helix-like 
structure between 094 Arg and 098 Thr. This latter residue is 
buried at the interface with the or-subunit and its importance in 
subunit association has been demonstrated by mutagenesis 22 . 
Residues of the 0-subunit implicated in receptor binding 22 
(Arg 94, Arg 95, Ser 96 and Asp 99) are accessible at the dimer 
surface. Thr 97, generally insensitive to mutagenesis, is in close 
contact with the carbohydrate of a 52 Asn. 
The (038-57) sequence (Fig. 4e). Much attention has been 
focused on the longest inter-cysteine loop 038-57. Synthetic pep- 
tides with this sequence stimulate steroidogenesis in rat Leydig 
cells 23 and strongly inhibit binding of whole hCG 23 . 



The crystal structure shows an unusual loop with residues 40- 
46 buried at the a/0 interface (Fig. 3). Mutagenesis of Arg 43 
to Leu (as in FSH) in hCG 24 , or replacement of it by Ala or 
Asp in synthetic (38-57) peptides 25 either significantly diminishes 
binding or eliminates it. A natural mutation in LH of Gin 54 to 
Arg causes hypogonadism and the mutant hormone does not 
bind to its receptor 26 . Residues 47-53 are predominantly hydro- 
phobic and exposed at the surface, forming a wedge-shaped 
extrusion (Fig. 4e). This unusual structure, by virtue of its com- 
position and proximity to the determinant loop, is likely to be 
important in receptor binding. The orientation of Arg 43 sug- 
gests that it may stabilize the conformation of the loop, although 
direct interaction with the receptor cannot be excluded. 
The (a88-92) C-termlnus. Three of the five C-termina! residues 
(Tyr-Tyr-His-Lys-Ser-COOH), are indentical in 13 of the 18 
available amino-acid sequences. Carboxypeptidase digestion of 
both hCGa and LHa, although not detrimental to subunit 
assembly, essentially eliminates receptor binding 2728 . hCGa 
lacking residues 89-92 forms an active heterodimer but with 
reduction of both receptor binding and steroidogenesis 29 . 

Although residues 89-92 could not be positioned, the C-term- 
inus of the a-subunit should be located close to both the 038- 
57 sequence loop and the determinant loop of the 0-subunit, 
forming a composite receptor-binding site (Fig. 4c). The side 
chain of a 88 Tyr packs between the disulphides 093-100 and 
a59-87. Mutagenesis of this residue to Phe causes reduced 
binding affinity but increases the steroidogenic response at satur- 
ating concentrations 29 . As it is unlikely to be involved directly 
in receptor binding, this result may be due to subtle differences 
in conformation induced in the final four C-terminal residues. 
Other potential a-subunit residues. The a-subunit residues 40- 
50 contain the only helical structure in the protein. The helix 
consists of highly conserved residues and is adjacent to the deter- 
minant loop (Fig. 4c, d), suggesting a receptor-binding role. This 



FIG. 5 a, A Co trace of hCG-0, coloured according to variation in the 
amino-acid sequence of the other human glycoprotein hormones. A fog 
odds matrix 50 was used to define non-conservative (red) and conserv 
ative (cyan) sequence changes, invariant residues are coloured green 
deletions white. The conserved cystine side chains are also shown, b, 
Schematic drawings of structures of hCG-0 (green) compared to PDGF 
p (tan), TGF-0 (magenta) and NGF (blue) (coordinates from the Brook- 
haven Protein Data Bank 51 ; not given for residues 27-37 in PDGF-/?). 
The N termini and C termini of hCG-0 and TGF-0 have been truncated 
to simplify the diagram. The structures share the same cystine-knot 
motif, imposing strong conformational restraints on associated resi- 
dues. There are significant changes in the size and shape of the hairpin 
loops, but the largest differences are seen in the long loop. Positions 
where hCG-0 is cleaved to form the 0-core fragment are shown in red 
and highlight the similarity between this fragment and PDGF-/J. 
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is supported by recent worl^ 30 implicating Lys 44 in receptor 
binding. The structure further suggests that the invariant Leu 48 
and Val 49, which protrude into solvent at a turn, and Lys 51, 
which extends up towards the determinant loop, also play a part! 
Carbohydrate. As well as modifying the rate of clearance of 
the glycoprotein hormone from the body, some glycosylation 
is essential for full biological activity 31 . The LH/CG receptor 
contains, within the extracellular domain, a sequence similarity 
to soybean lectin 37 , which suggests that there is a binding site 
for carbohydrate. HF-hCG binds to receptors with slightly 
increased affinity but with loss or reduction of potency 33 . Studies 
on the function of the carbohydrate show that the W-glycosyla- 
tion sites are more important than the O-Iinked sites; also, gly- 
cosylation is not necessary for receptor recognition, and N- 
glycosylation of the a-subunit at Asn 52 is critical for signal 
transduction 34 * 35 . 

The carbohydrate at a52 Asn is positioned at the dimer inter- 
face, about 4 A away from the determinant loop (Fig. 4c). Its 
position, size and flexibility could be antagonistic to a peptide- 
based receptor binding, thereby explaining how removing carbo- 
hydrate increases receptor binding. Removal of both 0-subunit 
tf-glycosylation sites does not cause much loss of activity, but 
reduces the maximal steroidogenic activity 36 . Although the two 
Asn residues are very close to each other and located 20 A from 
a52 Asn, modelling with the complete complex carbohydrate 
shows that some of the distal sugar residues on /H3, /?30 and 
a52 could be close together (Fig. 46). 

Deglycosylated hCG can be converted back into an active 
form when bound to polyclonal antibodies 3137 , suggesting that 
a modified conformation is restored to the native form by the 
bound antibody 31 . As crystals of asialo-hCG are isomorphous 
with HF-hCG, there is no evidence in the crystal structure for 
conformational change caused by deglycosylation. The structure 
does support the alternative explanation 37 that the antibodies 
may mimic the steric interaction of the carbohydrate with the 
receptor. 

The family of glycoprotein hormones 

It is evident that the structures of the 0-subunits of the other 
members of this family (LH, FSH, TSH) will be essentially simi- 
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lar to the structure described here. Figure 5a shows the positions 
of the non-conserved, conserved and invariant residues for each 
of the human hormones when compared to hCG. LH has only 
a few random non-conserved changes, as expected. FSH and 
TSH both have functionally distinct receptors and less sequence 
similarity. The non-conservative sequence changes are concen- 
trated in the receptor-binding regions, namely the 38-57 long 
loop and the 93-100 determinant loop. There are unexpected 
changes in regions associated with the a-subunit, predominantly 
in the 100-110 seat belt and 39-47 region of the long loop. 
Conserved and invariant residues are concentrated in the two 
hairpin loops and the cystine knot, implying that these are 
important in maintaining the fold. Charged surface residues are 
also conserved in the hairpin loops, indicating that they may 
serve some common function for these hormones. 

Cystlne-knot growth factors 

The correct assignment of the disulphide bridges and the deter- 
mination of the tertiary fold of hCG reveals, surprisingly, that 
the glycoprotein hormones are members, with NGF, TGF-0 
and PDGF-0, of the superfamily of cystine-knot growth factors. 
In this family the proteins form either homodimers or hetero- 
dimers with homologous cystine-knot subunits. In contrast, the 
glycoprotein hormones form only heterodimers between a com- 
mon a-subunit and different 0-subunits. Structural comparisons 
with TGF-/?, PDGF-0 and NGF show that the cystine-knot 
motif allows variation in the size and conformation of the consti- 
tuent loops, but there is a pronounced similarity between part 
of hCG-/? and PDGF-0 (Fig. 5*). Whereas hCG is found mainly 
as a heterodimer in the early stages of pregnancy, in the later 
stages both subunits are found in large quantities* as separate 
molecules. Free a-subunit is more heavily glycosylated and has 
hormonal activity 38 . Smaller forms of the /?-subunit that have 
been considered degradation products are found. The smallest 
of these, 0-core 39 , is composed of two polypeptides, P6-A0 and 
055-92, which arise from excision of 15 residues from the long 
loop as well as truncation at the N and C termini. The structural 
similarity of this truncated 0-molecuIe with PDGF is striking 
and suggests that the circulating 0-core molecule has a defined 
tertiary structure and perhaps a biological function. □ 
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